Method and device for the natural-language recognition of a vocal expression

ABSTRACT

The invention relates to a method and a device for the natural-language recognition of a vocal expression. A vocal expression of a person is detected and converted into a voice signal to be processed by a voice recognition device. Afterwards, the voice signal is analyzed at the same time or sequentially in a plurality of voice recognition branches of the voice recognition device using a plurality of grammars, wherein the recognition process is successfully completed if the analysis of the voice signal in at least one voice recognition branch supplies a positive recognition result.

The invention relates to a method and a device for the natural language recognition of a vocal expression, in particular on the basis of a voice recognition system which for example can be executed on an electronic data processing system.

Voice recognition systems are provided for use in various application areas. For example voice recognition systems are used in combination with office applications for detection of texts or in combination with technical devices for their control and command input. Voice recognition systems are also used for the control of information and communications devices such as e.g. radio, mobile telephones and navigation systems. Moreover, companies use language dialog systems for customer service and information, said systems also being based on voice recognition systems. The patent application is related auf the latter.

In the process in the case of the automatic voice recognition system for the assessment of word sequences so-called speech models are used, which are based on a grammatical set of rules, also referred to as a grammar. The grammars define unambiguous sets of rules. Voice recognition systems based on grammars exhibit high recognition reliability.

In particular in the case of customer service in the technical area, for example in connection with mobile telephones and tariffs, more and more efficient voice recognition systems are called for. In order to understand the countless customer expressions, very large grammars are required, whose comprehensiveness is to the disadvantage of the recognition reliability.

Every automated language recognition process is based on the comparison of a concrete call expression with stored words or statements. Only in the case of a match is an expression considered to be recognized and can trigger a specified action. However, a “grammar dilemma” arises from this: small grammars have a low scope of recognition, but to make up for this a better recognition reliability. Large grammars conversely cover a great expression spectrum, while the recognition reliability sinks.

The object of the invention therefore lies in realizing a language recognition method and system with a large scope of recognition with low scope of the grammar. Hence what is wanted is a grammar model which uses the positive aspects of large and small grammars without connoting their negative aspects.

This task is solved in accordance with the invention by a method and a device with the features of the independent patent claims.

Preferred embodiments and additional advantageous features of the invention arise from the dependent claims.

The inventive method is based on the detection of a vocal expression of a person and conversion into a voice signal to be processed by a voice recognition device, the analysis of the voice signal at the same time or sequentially in a plurality of voice recognition branches of the voice recognition device using a plurality of grammars, and the successful completion of the detection process, if the analysis of the voice signal supplies a positive recognition result in at least one voice recognition branch.

In a first embodiment of the invention a simultaneous analysis of the vocal expression takes place by two or more independent grammars. In this case two or more simultaneous recognition processes are initiated by the vocal expression of a person, said recognition processes analyzing and assessing the vocal expression independently from each other. For example, a comparably small main grammar with a low scope of recognition is placed alongside a more comprehensive secondary grammar with an expanded scope of recognition. Both grammars are without common intersection.

A second embodiment of the invention relates to a grammar cascade. In the case of this model various grammars are used one after the other, that is, sequentially. At the moment in which a grammar supplies a recognition result, the cascade is exited and the recognition process is concluded. In the case of this method 100% of all expressions to be recognized are compared to the first grammar Depending on efficiency and arrangement of this grammar a portion of for example 20% of non-recognized expressions are forwarded to a second recognition step. For the case that a third recognition step is integrated, it can be assumed that a portion of for example 5% of all incoming expressions reach this third recognition step.

With both recognition methods a comprehensive expression spectrum is supposed to be covered with a plurality of “smaller” grammars, which, however, in combination guarantee a great recognition reliability. This can happen as described above in the form of a simultaneous or a successive recognition process.

The two preferred exemplary embodiments of the invention will be described in the following with the help of the drawings.

FIG. 1 shows schematically a first embodiment of the voice recognition system with voice recognition branches working parallel.

FIG. 2 shows schematically a second embodiment of the voice recognition system with sequentially working, cascaded voice recognition systems.

In accordance with FIG. 1 a vocal expression of a person which is present as a voice signal 10, is simultaneously fed to two voice recognition branches and analyzed by two grammars 12 and 14 (Grammar A and Grammar B). The two grammars 12, 14 have no common intersection, that is, they are based on different set of rules. Through the parallel processing of the voice signal the analysis expenditure increases and with it the necessary computing load in the application of the method on a computer. This circumstance is however compensated for by the more rapid recognition and significantly improved recognition reliability.

A comparison 16 of the voice signal to the grammar (A) 12 leads either to a positive recognition result (Yes) or a negative recognition result (No). Likewise a comparison 18 of the voice signal to the grammar (B) 14 leads either to a positive recognition result (Yes) or a negative recognition result (No). Within the scope of the recognition process with the simultaneously working grammars 12, 14 four possible recognition cases arise, which can be evaluated with the different methods by logic 20.

Grammar 1 Recognition (Main Grammar 2 case grammar) (Secondary grammar) Overall result 1 No result No result Not recognized (No) (No) 2 Result No result Recognized (Yes) (No) 3 No result Result Recognized (No) (Yes) 4 Result Result Recognized (Yes) (Yes)

The recognition cases 1 through 3 are unproblematic insofar as they supply unambiguous results: Case 1 forces a non-recognition of the voice signal and with that a rejection, Position 24. Cases 2 and 3 supply only one positive result each and with this unambiguous clearly indicate a recognition of the voice signal, Position 22.

For case 4, in which both grammars 12, 14 have recognized the voice signal 10, on the other hand, a special method logic must be implemented, since the result is not unambiguous. Said method logic can rigidly decide in favor of grammar 12, can be oriented to the recognition reliability (Confidence Level) or form a hybrid of both (e.g.: result from grammar 14 is only used if recognition reliability is higher than in the case of grammar 12 by a predefined value).

In place of two parallel voice recognition branches in accordance with the invention three or more parallel working voice recognition branches can also be provided.

FIG. 2 shows another preferred embodiment of the invention. Here there are several grammars 12, 14 and 26 (Grammars A, B and C) connected sequentially to each other in the form of a cascade. That is, in the case of the grammar cascade the various grammars 12, 14 and 26 are not addressed simultaneously, but rather successively. Schematically the recognition operation can be described in the following manner: At the moment in which a grammar supplies a positive recognition result, the cascade is exited and the recognition process is concluded, Position 22.

The voice signal 10 is first fed to a first grammar (A) 12 and analyzed there. A comparison 16 of the voice signal to the grammar (A) 12 leads either to a positive recognition result (Yes), in which case the recognition process is successfully concluded, or a negative recognition result (No), in which case the voice signal is fed to a second grammar (B) 14 for further analysis. A comparison 18 of the voice signal 10 to the second grammar (B) 14 leads either to a positive recognition result (Yes), in which case the recognition process is successfully concluded, or a negative recognition result (No), in which case the voice signal is fed to a third grammar (C) 26 for further analysis. A comparison 28 of the voice signal to the third grammar (C) 26 leads either to a positive recognition result (Yes), in which case the recognition process is successfully concluded, or a negative recognition result (No), in which case the voice signal is rejected as not recognized, Position 24.

In the case of these methods first 100% of all incoming voice signals 10 are compared to the first grammar 12. Depending on efficiency and design of this grammar, a portion of the vocal expressions will not be recognized. These non-recognized voice signals are thereupon further submitted to the second recognition step. Depending on efficiency and design of the second recognition step, the voice signals are thereupon further submitted to the third recognition step.

The advantage of the grammar cascade vis-á-vis the method of simultaneous recognition by a plurality of grammars lies in the fact that there is no additional computing load, since the voice signal 10 is only compared with one grammar at any point in time. Through the successive recognition however there is necessarily an increase in the latency period in the system.

In place of three cascaded voice recognition branches in accordance with the invention four or more sequentially working voice recognition branches can also be provided.

LIST OF THE REFERENCE SYMBOLS

-   10 Voice signal -   12 Grammar A -   14 Grammar B -   18 Branch A -   20 Branch B -   22 Recognition successful -   24 Recognition not successful -   26 Grammar C -   28 Branch C 

1. A method for the natural language recognition of a vocal expression, with the steps: detection of the vocal expression and conversion into a voice signal to be processed by a voice recognition device, sequential analysis of the voice signal in a plurality of voice recognition branches of the voice recognition device using a plurality of grammars and successful completion of the recognition process of the vocal expression, in case the analysis of the voice signal supplies a positive recognition result in at least one voice recognition branch, characterized by the steps: a) feeding of the voice signal to a first voice recognition branch comprising a first grammar for analysis of the voice signal, b) analysis of the voice signal by the first grammar, wherein in the case of a recognition of the vocal expression a positive first recognition is generated and the recognition process is concluded and in the case of a non-recognition of the vocal expression a negative first recognition result is generated, c) wherein in the case of negative recognition result the voice signal is fed to a further voice recognition branch comprising a further grammar. d) analysis of the voice signal by the further grammar, wherein in the case of a recognition of the vocal expression a positive recognition is generated and the recognition process is concluded and in the case of a non-recognition of the vocal expression a negative first recognition result is generated, e) wherein in the case of a negative recognition result the method continues with step (c) until the grammars of all existing voice recognition branches have been run.
 2. The method according to claim 1, characterized in that the sets of rules of the grammars do not exhibit a common intersection.
 3. The method according to claim 1, characterized in that a first grammar analyzes frequently occurring vocal expressions, a second grammar analyzes less frequently occurring vocal expressions and any further grammar analyzes even less frequently occurring vocal expressions.
 4. The method according to claim 1, characterized in that when both the first and the second recognition result are positive, the recognition result supplied by the first grammar is used.
 5. The method according to claim 1, characterized in that when both the first and the second recognition result are positive, the recognition result whose recognition reliability is the greatest is used.
 6. A device for the natural language recognition of a vocal expression which comprises: means for detection of the vocal expression and for conversion into a voice signal to be processed by a voice recognition device, a voice recognition device with a plurality of voice recognition branches, wherein each voice recognition branch exhibits a grammar for the analysis of the voice signal, wherein the voice signal is fed to the voice recognition branches sequentially, and means for the control and evaluation of the recognition process in dependency on the recognition result of at least one voice recognition branch.
 7. A computer program with a program code which executes on a computer a method according to claim
 1. 8. A computer program product which comprises program code executable on a computer for the carrying out of the method according to claim
 1. 9. The method according to claim 2, characterized in that a first grammar analyzes frequently occurring vocal expressions, a second grammar analyzes less frequently occurring vocal expressions and any further grammar analyzes even less frequently occurring vocal expressions.
 10. The method according to claim 2, characterized in that when both the first and the second recognition result are positive, the recognition result supplied by the first grammar is used.
 11. The method according to claim 3, characterized in that when both the first and the second recognition result are positive, the recognition result supplied by the first grammar is used.
 12. The method according to claim 2, characterized in that when both the first and the second recognition result are positive, the recognition result whose recognition reliability is the greatest is used.
 13. The method according to claim 3, characterized in that when both the first and the second recognition result are positive, the recognition result whose recognition reliability is the greatest is used.
 14. A computer program with a program code which executes on a computer a method according to claim
 2. 15. A computer program with a program code which executes on a computer a method according to claim
 3. 16. A computer program with a program code which executes on a computer a method according to claim
 4. 17. A computer program with a program code which executes on a computer a method according to claim
 5. 18. A computer program product which comprises program code executable on a computer for the carrying out of the method according to claim
 2. 19. A computer program product which comprises program code executable on a computer for the carrying out of the method according to claim
 3. 20. A computer program product which comprises program code executable on a computer for the carrying out of the method according to claim
 4. 